A Study with Class Imbalance and Random Sampling for a Decision Tree Learning System
نویسندگان
چکیده
Sampling methods are a direct approach to tackle the problem of class imbalance. These methods sample a data set in order to alter the class distributions. Usually these methods are applied to obtain a more balanced distribution. An open-ended question about sampling methods is which distribution can provide the best results, if any. In this work we develop a broad empirical study aiming to provide more insights into this question. Our results suggest that altering the class distribution can improve the classification performance of classifiers considering AUC as a performance metric. Furthermore, as a general recommendation, random over-sampling to balance distribution is a good starting point in order to deal with class imbalance.
منابع مشابه
MMDT: Multi-Objective Memetic Rule Learning from Decision Tree
In this article, a Multi-Objective Memetic Algorithm (MA) for rule learning is proposed. Prediction accuracy and interpretation are two measures that conflict with each other. In this approach, we consider accuracy and interpretation of rules sets. Additionally, individual classifiers face other problems such as huge sizes, high dimensionality and imbalance classes’ distribution data sets. This...
متن کاملA comparative study on rough set based class imbalance learning
This paper performs systematic comparative studies on rough set based class imbalance learning. We compare the strategies of weighting, re-sampling and filtering used in the rough set based methods for class imbalance learning. Weighting is better than re-sampling, and re-sampling is better than filtering. The weighted rough set based method achieves the best performance in class imbalance lear...
متن کاملExtracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem
Application of data mining methods as a decision support system has a great benefit to predict survival of new patients. It also has a great potential for health researchers to investigate the relationship between risk factors and cancer survival. But due to the imbalanced nature of datasets associated with breast cancer survival, the accuracy of survival prognosis models is a challenging issue...
متن کاملA weighted rough set based method developed for class imbalance learning
In this paper, we introduce weights into Pawlak rough set model to balance the class distribution of a data set and develop a weighted rough set based method to deal with the class imbalance problem. In order to develop the weighted rough set based method, we design first a weighted attribute reduction algorithm by introducing and extending Guiasu weighted entropy to measure the significance of...
متن کاملA Comparative Study of Decision Tree Algorithms for Class Imbalanced Learning in Credit Card Fraud Detection
Credit card fraud detection along with its inherent property of class imbalance is one of the major challenges faced by the financial institutions. Many classifiers are used for the fraud detection of imbalanced data. Imbalanced data withhold the performance of classifiers by setting up the overall accuracy as a performance measure. This makes the decision to be biased towards the majority clas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008